buggy program
- North America > United States (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Asia > India > Karnataka > Bengaluru (0.04)
- Law (1.00)
- Information Technology > Security & Privacy (0.69)
- Education > Curriculum > Subject-Specific Education (0.46)
- (2 more...)
- Research Report (0.68)
- Workflow (0.49)
- Law (0.68)
- Information Technology > Security & Privacy (0.48)
- Education > Curriculum > Subject-Specific Education (0.46)
- Education > Educational Setting (0.46)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.77)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.48)
Gradient-Based Program Repair: Fixing Bugs in Continuous Program Spaces
Silva, André, Thorén, Gustav, Monperrus, Martin
Automatic program repair seeks to generate correct code from buggy programs, with most approaches searching the correct program in a discrete, symbolic space of source code tokens. This symbolic search is fundamentally limited by its inability to directly reason about program behavior. We introduce Gradient-Based Program Repair (GBPR), a new paradigm that reframes program repair as continuous optimization in a differentiable numerical program space. Our core insight is to compile symbolic programs into differentiable numerical representations, enabling search in the numerical program space directly guided by program behavior. To evaluate GBPR, we present RaspBugs, a new benchmark of 1,466 buggy symbolic RASP programs and their respective numerical representations. Our experiments demonstrate that GBPR can effectively repair buggy symbolic programs by gradient-based optimization in the numerical program space, with convincing repair trajectories. To our knowledge, we are the first to state program repair as continuous optimization in a numerical program space. Our work establishes a new direction for program repair research, bridging two rich worlds: continuous optimization and program behavior.
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.68)
HarnessLLM: Automatic Testing Harness Generation via Reinforcement Learning
Liu, Yujian, Ji, Jiabao, Zhang, Yang, Guo, Wenbo, Jaakkola, Tommi, Chang, Shiyu
Existing LLM-based automatic test generation methods mainly produce input and expected output pairs to categorize the intended behavior of correct programs. Although straightforward, these methods have limited diversity in generated tests and cannot provide enough debugging information. We propose HarnessLLM, a two-stage training pipeline that enables LLMs to write harness code for testing. Particularly, LLMs generate code that synthesizes inputs and validates the observed outputs, allowing complex test cases and flexible output validation such as invariant checking. To achieve this, we train LLMs with SFT followed by RLVR with a customized reward design. Experiments show that HarnessLLM outperforms input-output-based testing in bug finding and testing strategy diversity. HarnessLLM further benefits the code generation performance through test-time scaling with our generated test cases as inference-phase validation. Our code is available at https://github.com/UCSB-NLP-Chang/HarnessLLM.git.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- Asia > Middle East > Jordan (0.04)
- Law (1.00)
- Information Technology (1.00)
- Education (1.00)
- Government (0.68)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.67)
- Research Report (0.68)
- Workflow (0.49)
- Information Technology > Security & Privacy (0.48)
- Education > Curriculum > Subject-Specific Education (0.46)
- Education > Educational Setting (0.46)
- Education > Educational Technology > Educational Software > Computer Based Training (0.34)
- North America > United States (0.04)
- North America > Canada (0.04)
- Asia > India > Karnataka > Bengaluru (0.04)
- Instructional Material (0.68)
- Research Report (0.46)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Counterexample Guided Program Repair Using Zero-Shot Learning and MaxSAT-based Fault Localization
Orvalho, Pedro, Janota, Mikoláš, Manquinho, Vasco
Automated Program Repair (APR) for introductory programming assignments (IPAs) is motivated by the large number of student enrollments in programming courses each year. Since providing feedback on IPAs requires substantial time and effort from faculty, personalized feedback often involves suggesting fixes to students' programs. Formal Methods (FM)-based semantic repair approaches, check a program's execution against a test suite or reference solution, are effective but limited. These tools excel at identifying buggy parts but can only fix programs if the correct implementation and the faulty one share the same control flow graph. Conversely, Large Language Models (LLMs) are used for APR but often make extensive instead of minimal rewrites. This leads to more invasive fixes, making it harder for students to learn from their mistakes. In summary, LLMs excel at completing strings, while FM-based fault localization excel at identifying buggy parts of a program. In this paper, we propose a novel approach that combines the strengths of both FM-based fault localization and LLMs, via zero-shot learning, to enhance APR for IPAs. Our method uses MaxSAT-based fault localization to identify buggy parts of a program, then presents the LLM with a program sketch devoid of these buggy statements. This hybrid approach follows a CEGIS loop to iteratively refine the program. We ask the LLM to synthesize the missing parts, which are then checked against a test suite. If the suggested program is incorrect, a counterexample from the test suite is fed back to the LLM. Our experiments show that our counterexample guided approach, using MaxSAT-based bug-free program sketches, significantly improves the repair capabilities of all six evaluated LLMs. This method allows LLMs to repair more programs with smaller fixes, outperforming other configurations and state-of-the-art symbolic program repair tools.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- Oceania > Australia > Victoria > Melbourne (0.04)
- (16 more...)
- Research Report > New Finding (0.46)
- Research Report > Promising Solution (0.34)
- Overview > Innovation (0.34)
Effective Large Language Model Debugging with Best-first Tree Search
Song, Jialin, Raiman, Jonathan, Catanzaro, Bryan
However, their code-writing abilities are often limited in scope: while they can successfully implement simple functions, they struggle with more complex tasks. A fundamental difference with how an LLM writes code, compared to a human programmer, is that it cannot consistently spot and fix bugs. Debugging is a crucial skill for programmers and it enables iterative code refinement towards a correct implementation. In this work, we propose a novel algorithm to enable LLMs to debug their code via self-reflection and search where a model attempts to identify its previous mistakes. Our key contributions are 1) a best-first tree search algorithm with self-reflections (BESTER) that achieves state-of-the-art Pass@1 in three code generation benchmarks. BESTER maintains its superiority when we measure pass rates taking into account additional inference costs incurred by tree search.
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- Asia > British Indian Ocean Territory > Diego Garcia (0.04)